sample complexity
Product distribution learning with imperfect advice
We revisit this problem when the learner is also given as advice the parameters of a product distribution Q. We show that there is an efficient algorithm to learn P within TV distance ฮตthat has sample complexity O(d1 ฮท/ฮต2), if p q 1 < ฮตd0.5 โฆ(ฮท). Here, p and q are the mean vectors of P and Q respectively, and no bound on p q 1 is known to the algorithm a priori.
Agnostic Active Learning Is Always Better Than Passive Learning
We provide the first sharp characterization of the optimal first-order query complexity of agnostic active learning, and propose a new general active learning algorithm which achieves it. Remarkably, the optimal query complexity admits a leading term which is always strictly smaller than the sample complexity of passive supervised learning (by a factor proportional to the best-in-class error rate). This was not previously known to be possible. For comparison, in all previous general analyses, the leading term exhibits an additional factor, such as the disagreement coefficient or related complexity measures, and therefore only provides improvements over passive learning in restricted cases. The present work completely removes such factors from the leading term, implying that every concept class benefits from active learning in the non-realizable case. Whether such benefits are possible has been the driving question underlying the past two decades of research on the theory of agnostic active learning. This work finally settles this fundamental question.
Stabilizing LTISystems under Partial Observability: Sample Complexity and Fundamental Limits
We study the problem of stabilizing an unknown partially observable linear timeinvariant (LTI) system. For fully observable systems, the state-of-the-art approaches leverage an unstable/stable subspace decomposition to achieve sample complexity that depends only on the number of unstable modes, independent of the dimension of the system state. However, it remains open whether such sample complexity can be achieved for partially observable systems because such systems do not admit a uniquely identifiable unstable subspace. In this paper, we propose LTS-P, a novel technique that leverages compressed singular value decomposition (SVD) on the "lifted" Hankel matrix to estimate the unstable subsystem up to an unknown transformation.
Revisiting Agnostic Boosting
Boosting is a key method in statistical learning, allowing for converting weak learners into strong ones. While well studied in the realizable case, the statistical properties of weak-to-strong learning remain less understood in the agnostic setting, where there are no assumptions on the distribution of the labels. In this work, we propose a new agnostic boosting algorithm with substantially improved sample complexity compared to prior works under very general assumptions. Our approach is based on a reduction to the realizable case, followed by a margin-based filtering of high-quality hypotheses. Furthermore, we show a nearly-matching lower bound, settling the sample complexity of agnostic boosting up to logarithmic factors.
Simple and Optimal Sublinear Algorithms for Mean Estimation
We study the sublinear multivariate mean estimation problem in d-dimensional Euclidean space. Specifically, we aim to find the mean ยต of a ground point set A, which minimizes the sum of squared Euclidean distances of the points in Ato ยต. We first show that a multiplicative (1 + ฮต) approximation to ยต can be found with probability 1 ฮด using O(ฮต 1 logฮด 1)many independent uniform random samples, and provide a matching lower bound. Furthermore, we give two estimators with optimal sample complexity that can be computed in optimal running time for extracting a suitable approximate mean: 1.
On the sample complexity of semi-supervised multi-objective learning
In multi-objective learning (MOL), several possibly competing prediction tasks must be solved jointly by a single model. Achieving good trade-offs may require a model class G with larger capacity than what is necessary for solving the individual tasks. This, in turn, increases the statistical cost, as reflected in known MOL bounds that depend on the complexity of G. We show that this cost is unavoidable for some losses, even in an idealized semi-supervised setting, where the learner has access to the Bayes-optimal solutions for the individual tasks as well as the marginal distributions over the covariates. On the other hand, for objectives defined with Bregman losses, we prove that the complexity of G may come into play only in terms of unlabeled data. Concretely, we establish sample complexity upper bounds, showing precisely when and how unlabeled data can significantly alleviate the need for labeled data. This is achieved by a simple pseudo-labeling algorithm.
Streaming Federated Learning with Markovian Data
Federated learning (FL) is now recognized as a key framework for communicationefficient collaborative learning. Most theoretical and empirical studies, however, rely on the assumption that clients have access to pre-collected data sets, with limited investigation into scenarios where clients continuously collect data. In many real-world applications, particularly when data is generated by physical or biological processes, client data streams are often modeled by non-stationary Markov processes.
Finite-Time Bounds for Average-Reward Fitted Q-Iteration
Although there is an extensive body of work characterizing the sample complexity of discounted-return offline RL with function approximations, prior work on the average-reward setting has received significantly less attention, and existing approaches rely on restrictive assumptions, such as ergodicity or linearity of the MDP. In this work, we establish the first sample complexity results for average-reward offline RL with function approximation for weakly communicating MDPs, a much milder assumption. To this end, we introduce Anchored Fitted Q-Iteration, which combines the standard Fitted Q-Iteration with an anchor mechanism. We show that the anchor, which can be interpreted as a form of weight decay, is crucial for enabling finite-time analysis in the average-reward setting. We also extend our finite-time analysis to the setup where the dataset is generated from a singletrajectory rather than IID transitions, again leveraging the anchor mechanism.
Learning Juntas under Markov Random Fields
We give an algorithm for learning O(logn)juntas in polynomial-time with respect to Markov Random Fields (MRFs) in a smoothed analysis framework where only the external field has been randomly perturbed. This is a broad generalization1 of the work of Kalai and Teng, who gave an algorithm that succeeded with respect to smoothed product distributions (i.e., MRFs whose dependency graph has no edges). Our algorithm has two phases: (1) an unsupervised structure learning phase and (2) a greedy supervised learning algorithm. This is the first example where algorithms for learning the structure of undirected graphical models have downstream applications to supervised learning.
a6e072cfc12794cba1e861f57be8f4de-Paper-Conference.pdf
We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.